CS224N Natural Language Processing with Deep Learning Assignment 3
课程主页:https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/
视频地址:https://www.bilibili.com/video/av46216519?from=search&seid=13229282510647565239
这里回顾CS224N Assignment 3的内容,参考资料如下:
https://github.com/ZacBi/CS224n-2019-solutions
1.Machine Learning & Neural Networks
(a)
(i)
该更新方式实际上计算了梯度的加权和,所以不会变化太大;低方差可以减少震荡的情形。
(ii)
梯度小的地方会得到更大的更新,梯度大的地方会得到更小的更新,该方法使得各个方向的更新幅度比较接近,可以减少震荡的情形。
(b)
(i)
所以
(ii)
训练时使用dropout是为了探索更多网络结构,增加模型的泛化性;评估时需要一个准确的结果,所以不使用dropout。
2. Neural Transition-Based Dependency Parsing
(a)
Stack | Buffer | New dependency | Transition |
---|---|---|---|
[ROOT] | [I, parsed, this, sentence, correctly] | Initial Configuration | |
[ROOT, I] | [parsed, this, sentence, correctly] | SHIFT | |
[ROOT, I, parsed] | [this, sentence, correctly] | SHIFT | |
[ROOT, parsed] | [this, sentence, correctly] | parsed$\to$I | LEFT-ARC |
[ROOT, parsed,this] | [sentence, correctly] | SHIFT | |
[ROOT, parsed,this,sentence] | [correctly] | SHIFT | |
[ROOT, parsed,sentence] | [correctly] | sentence$\to$this | LEFT-ARC |
[ROOT, parsed] | [correctly] | parsed$\to$sentence | RIGHT-ARC |
[ROOT, parsed,correctly] | [] | SHIFT | |
[ROOT, parsed] | [] | parsed$\to$correctly | RIGHT-ARC |
[ROOT] | [] | ROOT$\to$parsed | RIGHT-ARC |
(b)
$O(n)$,因为一共要Shift $n$次,然后生成ARC也要$n$次。
(c)
init
### YOUR CODE HERE (3 Lines)
### Your code should initialize the following fields:
### self.stack: The current stack represented as a list with the top of the stack as the
### last element of the list.
### self.buffer: The current buffer represented as a list with the first item on the
### buffer as the first item of the list
### self.dependencies: The list of dependencies produced so far. Represented as a list of
### tuples where each tuple is of the form (head, dependent).
### Order for this list doesn't matter.
###
### Note: The root token should be represented with the string "ROOT"
###
self.stack = ["ROOT"]
self.buffer = copy.deepcopy(sentence)
self.dependencies = []
### END YOUR CODE
parse step
### YOUR CODE HERE (~7-10 Lines)
### TODO:
### Implement a single parsing step, i.e. the logic for the following as
### described in the pdf handout:
### 1. Shift
### 2. Left Arc
### 3. Right Arc
if transition == "S":
word = self.buffer.pop(0)
self.stack.append(word)
elif transition == "LA":
self.dependencies.append((self.stack[-1], self.stack[-2]))
self.stack.pop(-2)
else:
self.dependencies.append((self.stack[-2], self.stack[-1]))
self.stack.pop(-1)
### END YOUR CODE
(d)
minibatch parse
### YOUR CODE HERE (~8-10 Lines)
### TODO:
### Implement the minibatch parse algorithm as described in the pdf handout
###
### Note: A shallow copy (as denoted in the PDF) can be made with the "=" sign in python, e.g.
### unfinished_parses = partial_parses[:].
### Here `unfinished_parses` is a shallow copy of `partial_parses`.
### In Python, a shallow copied list like `unfinished_parses` does not contain new instances
### of the object stored in `partial_parses`. Rather both lists refer to the same objects.
### In our case, `partial_parses` contains a list of partial parses. `unfinished_parses`
### contains references to the same objects. Thus, you should NOT use the `del` operator
### to remove objects from the `unfinished_parses` list. This will free the underlying memory that
### is being accessed by `partial_parses` and may cause your code to crash.
partial_parses = [PartialParse(sentence) for sentence in sentences]
unfinished_parses = partial_parses[:]
n = len(unfinished_parses)
while (n > 0):
l = min(n, batch_size)
transitions = model.predict(unfinished_parses[:l])
for parse, trans in zip(unfinished_parses[:l], transitions):
parse.parse_step(trans)
if len(parse.stack) == 1:
unfinished_parses.remove(parse)
n -= 1
dependencies = [partial_parses.dependencies for partial_parses in partial_parses]
### END YOUR CODE
(e)
init
self.embed_to_hidden = nn.Linear(self.n_features * self.embed_size, self.hidden_size)
nn.init.xavier_uniform_(self.embed_to_hidden.weight)
self.dropout = nn.Dropout(self.dropout_prob)
self.hidden_to_logits = nn.Linear(self.hidden_size, self.n_classes)
nn.init.xavier_uniform_(self.hidden_to_logits.weight)
embedding_lookup
x = self.pretrained_embeddings(t)
x = x.view(x.size()[0], -1)
forward
embeddings = self.embedding_lookup(t)
hidden = self.embed_to_hidden(embeddings)
hidden = nn.ReLU()(hidden)
hidden = self.dropout(hidden)
logits = self.hidden_to_logits(hidden)
train
optimizer = optim.Adam(parser.model.parameters(), lr=lr)
loss_func = nn.CrossEntropyLoss()
train_for_epoch
logits = parser.model.forward(train_x)
loss = loss_func(logits, train_y)
loss.backward()
optimizer.step()
计算结果如下
dev UAS: 88.38
test UAS: 88.90
(f)
略过
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Doraemonzzz!
评论
ValineLivere